Learning Purposeful Behaviour in the Absence of Rewards

نویسندگان

Marlos C. Machado

Michael H. Bowling

چکیده

Artificial intelligence is commonly defined as the ability to achieve goals in the world. In the reinforcement learning framework, goals are encoded as reward functions that guide agent behaviour, and the sum of observed rewards provide a notion of progress. However, some domains have no such reward signal, or have a reward signal so sparse as to appear absent. Without reward feedback, agent behaviour is typically random, often dithering aimlessly and lacking intentionality. In this paper we present an algorithm capable of learning purposeful behaviour in the absence of rewards. The algorithm proceeds by constructing temporally extended actions (options), through the identification of purposes that are “just out of reach” of the agents current behaviour. These purposes establish intrinsic goals for the agent to learn, ultimately resulting in a suite of behaviours that encourage the agent to visit different parts of the state space. Moreover, the approach is particularly suited for settings where rewards are very sparse, and such behaviours can help in the exploration of the environment until reward is observed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The mediating role of organizational purposeful forgetting in the influence of genuine leadership on organizational learning in Staff of Ministry of Petroleum

The purpose of this study was to investigate the effect of genuine leadership on organizational learning with regard to mediating variable of purposeful organizational forgetting. The methodology of study was applied in terms of purpose and descriptive-correlative in terms of implementation. The statistical population of the study consisted of 840 employees of headquarter of the Ministry of P...

متن کامل

The Necessity of Average Rewards in Cooperative Multirobot Learning

Learning can be an effective way for robot systems to deal with dynamic environments and changing task conditions. However, popular singlerobot learning algorithms based on discounted rewards, such as Q learning, do not achieve cooperation (i.e., purposeful division of labor) when applied to task-level multirobot systems. A tasklevel system is defined as one performing a mission that is decompo...

متن کامل

کاهش ارزش تأخیری و همبستگی آن با چشم انداز زمان در کارورزان رشته پزشکی

AbstractIntroduction: Delay discounting (DD) means prefering small immediate rewards to large delayed rewards. This study was to assess delay discounting and the correlation of our findings with that of the Zimbardo Time Perspective Inventory (ZTPI).Method: In a cross-sectional study, DD and time perspective were investigated in 93 medical interns by means of a computer software and ZTPI. In d...

متن کامل

بررسی مقایسه ای تاثیر کارآموزی بالینی اصول و فنون به روش ایفای نقش و روش سنتی بر رفتارهای مراقبتی دانشجویان پرستاری

Background and Aim: Caring is a multidimensional nursing concept that can be actualized within the baccalaureate nursing curriculum through the purposeful teaching and student centered learning of core values. Although, the learning of caring is widely accepted, it has not been proved through research. The aim of this study was to assess and compare the effectiveness of clinical practice of...

متن کامل

Reinforcement Learning in Biologically-Inspired Collective Robotics: A Rough Set Approach

This thesis presents a rough set approach to reinforcement learning. This is made possible by considering behaviour patterns of learning agents in the context of approximation spaces. Rough set theory introduced by Zdzisław Pawlak in the early 1980s provides a ground for deriving pattern-based rewards within approximation spaces. Learning can be considered episodic. The framework provided by an...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1605.07700 شماره

صفحات -

تاریخ انتشار 2016

Learning Purposeful Behaviour in the Absence of Rewards

نویسندگان

چکیده

منابع مشابه

The mediating role of organizational purposeful forgetting in the influence of genuine leadership on organizational learning in Staff of Ministry of Petroleum

The Necessity of Average Rewards in Cooperative Multirobot Learning

کاهش ارزش تأخیری و همبستگی آن با چشم انداز زمان در کارورزان رشته پزشکی

بررسی مقایسه ای تاثیر کارآموزی بالینی اصول و فنون به روش ایفای نقش و روش سنتی بر رفتارهای مراقبتی دانشجویان پرستاری

Reinforcement Learning in Biologically-Inspired Collective Robotics: A Rough Set Approach

عنوان ژورنال:

اشتراک گذاری